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Abstract. We study Recursive Concurrent Stochastic Games (RCSGs), extending our 
recent analysis of recursive simple stochastic games to a concurrent setting where the 
two players choose moves simultaneously and independently at each state. For multi-exit 
games, our earlier work already showed undecidability for basic questions like termination, 
thus we focus on the important case of single-exit RCSGs (1-RCSGs). 

We first characterize the value of a 1-RCSG termination game as the least fixed point so- 
lution of a system of nonlinear minimax functional equations, and use it to show PSPACE 
decidability for the quantitative termination problem. We then give a strategy improve- 
ment technique, which we use to show that player 1 (maximizer) has e-optimal randomized 
Stackless & Memoryless (r-SM) strategies for all e > 0, while player 2 (minimizer) has op- 
timal r-SM strategies. Thus, such games are r-SM-determined. These results mirror and 
generalize in a strong sense the randomized memoryless determinacy results for finite sto- 
chastic games, and extend the classic Hoffman-Karp strategy improvement approach from 
the finite to an infinite state setting. The proofs in our infinite-state setting are very dif- 
ferent however, relying on subtle analytic properties of certain power series that arise from 
studying 1-RCSGs. 

We show that our upper bounds, even for qualitative (probability 1) termination, can 
not be improved, even to NP, without a major breakthrough, by giving two reductions: first 
a P-time reduction from the long-standing square-root sum problem to the quantitative 
termination decision problem for finite concurrent stochastic games, and then a P-time 
reduction from the latter problem to the qualitative termination problem for 1-RCSGs. 



In recent work we have studied Recursive Markov Decision Processes (RMDPs) and turn- 
based Recursive Simple Stochastic Games (RSSGs) ([l6j[T7]), providing a number of strong 
upper and lower bounds for their analysis. These define infinite-state (perfect informa- 
tion) stochastic games that extend Recursive Markov Chains (RMCs) ( \14\ [15] ) with non- 
probabilistic actions controlled by players. Here we extend our study to Recursive Concur- 
rent Stochastic Games (RCSGs), where the two players choose moves simultaneously and 
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independently at each state, unlike RSSGs where only one player can move at each state. 
RCSGs define a class of infinite-state zero-sum (imperfect information) stochastic games 
that can naturally model probabilistic procedural programs and other systems involving 
both recursive and probabilistic behavior, as well as concurrent interactions between the 
system and the environment. Informally, all such recursive models consist of a finite col- 
lection of finite state component models (of the same type) that can call each other in a 
potentially recursive manner. For RMDPs and RSSGs with multiple exits (terminating 
states), our earlier work already showed that basic questions such as almost sure termina- 
tion (i.e. does player 1 have a strategy that ensures termination with probability 1) are 
already undecidable; on the other hand, we gave strong upper bounds for the important 
special case of single- exit RMDPs and RSSGs (called 1-RMDPs and 1-RSSGs). 

Our focus in this paper is thus on single-exit Recursive Concurrent Stochastic Games 
(1-RCSGs for short). These models correspond to a concurrent game version of multi-type 
Branching Processes and Stochastic Context-Free Grammars, both of which are important 
and extensively studied stochastic processes with many applications including in population 
genetics, nuclear chain reactions, computational biology, and natural language processing 
(see, e.g., [2TJ E3 El] and other references in [TH H]). It is very natural to consider 
game extensions to these stochastic models. Branching processes model the growth of a 
population of entities of distinct types. In each generation each entity of a given type gives 
rise, according to a probability distribution, to a multi-set of entities of distinct types. A 
branching process can be mapped to a 1-exit Recursive Markov Chain (1-RMC) such that 
the probability of eventual extinction of a species is equal to the probability of termination 
in the 1-RMC. Modeling the process in a context where external agents can influence the 
evolution to bias it towards extinction or towards survival leads naturally to a game. A 
1-RCSG models the process where the evolution of some types is affected by the concurrent 
actions of external favorable and unfavorable agents (forces). 

In [16], we showed that for the turned-based 1-RSSG termination game, where the 
goal of player 1 (respectively, player 2) is to maximize (resp. minimize) the probability 
of termination starting at a given vertex (in the empty calling context), we can decide 
in PSPACE whether the value of the game is > p for a given probability p, and we can 
approximate this value (which can be irrational) to within given precision with the same 
complexity. We also showed that both players have optimal deterministic Stackless and 
Memoryless (SM) strategies in the 1-RSSG termination game; these are strategies that 
depend neither on the history of the game nor on the call stack at the current state. Thus 
from each vertex belonging to the player, such a strategy deterministically picks one of the 
outgoing transitions. 

Already for finite-state concurrent stochastic games (CSGs), even under the simple 
termination objective, the situation is rather different. Memoryless strategies do suffice 
for both players, but randomization of strategies is necessary, meaning we can't hope for 
deterministic e-optimal strategies for either player. Moreover, player 1 (the maximizer) 
can only attain e-optimal strategies, for e > 0, whereas player 2 (the minimizer) does have 
optimal randomized memoryless strategies (see, e.g., |19[ll2j). Another important result for 
finite CSGs is the classic Hoffman-Karp [22] strategy improvement method, which provides, 
via simple local improvements, a sequence of randomized memoryless strategies which yield 
payoffs that converge to the value of the game. 

Here we generalize all these results to the infinite-state setting of 1-RCSG termination 
games. We first characterize values of the 1-RCSG termination game as the least fixed 
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point solution of a system of nonlinear minimax functional equations. We use this to show 
PSPACE decidability for the qualitative termination problem (is the value of the game 
= 1?) and the quantitative termination problem (is the value of the game > r (or < r, etc.), 
for given rational r), as well as PSPACE algorithms for approximating the termination 
probabilities of 1-RCSGs to within a given number of bits of precision, via results for the 
existential theory of reals. (The simpler "qualitative problem" of deciding whether the game 
value is = only depends on the transition structure of the 1-RCSG and not on the specific 
probabilities. For this problem we give a polynomial time algorithm.) 

We then proceed to our technically most involved result, a strategy improvement tech- 
nique for 1-RCSG termination games. We use this to show that in these games player 1 
(maximizer) has e-optimal randomized-Stackless & Memoryless (r-SM for short) strategies, 
whereas player 2 (minimizer) has optimal r-SM strategies. Thus, such games are r-SM- 
determined. These results mirror and generalize in a very strong sense the randomized 
memoryless determinacy results known for finite stochastic games. Our technique extends 
Hoffman-Karp's strategy improvement method for finite CSGs to an infinite state setting. 
However, the proofs in our infinite-state setting are very different. We rely on subtle analytic 
properties of certain power series that arise from studying 1-RCSGs. 

Note that our PSPACE upper bounds for the quantitative termination problem for 
1-RCSGs can not be improved to NP without a major breakthrough, since already for 1- 
RMCs we showed in [14] that the quantitative termination problem is at least as hard as 
the square-root sum problem (see [H]). In fact, here we show that even the qualitative 
termination problem for 1-RCSGs, where the problem is to decide whether the value of 
the game is exactly 1, is already as hard as the square-root sum problem, and moreover, 
so is the quantitative termination decision problem for finite CSGs. We do this via two 
reductions: we give a P-time reduction from the square-root sum problem to the quantitative 
termination decision problem for finite CSGs, and a P-time reduction from the quantitative 
finite CSG termination problem to the qualitative 1-RCSG termination problem. 

It is known ([6]) that for finite concurrent games, probabilistic nodes do not add any 
power to these games, because the stochastic nature of the games can in fact be simulated 
by concurrency alone. The same is true for 1-RCSGs. Specifically, given a finite CSG (or 1- 
RCSG), G, there is a P-time reduction to a finite concurrent game (or 1-RCG, respectively) 
F(G), without any probabilistic vertices, such that the value of the game G is exactly the 
same as the value of the game F(G). We will provide a proof of this in Section [2] for 
completeness. 

Related work. Stochastic games go back to Shapley [28J, who considered finite concurrent 
stochastic games with (discounted) rewards. See, e.g., [19] for a recent book on stochastic 
games. Turn-based "simple" finite stochastic games were studied by Condon [10]. As men- 
tioned, we studied RMDPs and (turn-based) RSSGs and their quantitative and qualitative 
termination problems in |16[ [T7] . In [T7] we showed that the qualitative termination problem 
for both maximizing and minimizing 1-RMDPs is in P, and for 1-RSSGs is in NPPlcoNP. 
Our earlier work |14[ I15| developed theory and algorithms for Recursive Markov Chains 
(RMCs), and [131 [3] have studied probabilistic Pushdown Systems which are essentially 
equivalent to RMCs. 

Finite-state concurrent stochastic games have been studied extensively in recent CS 
literature (see, e.g., [7J Q2J CD])- In particular, the papers [8] and [7] have studied, for 
finite CSGs, the approximate reachability problem and approximate parity game problem, 
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respectively. In those papers, it was claimed that these approximation problems are in 
NPPlcoNP. Actually there was a minor problem with the way the results on approximation 
were phrased in [HI [7], as pointed out in the conference version of this paper [18], but this is 
a relatively unimportant point compared to the flaw we shall now discuss. There is in fact a 
serious flaw in a key proof of [HJ . The flaw relates to the use of a result from [19| which shows 
that for discounted stochastic games the value function is Lipschitz continuous with respect 
to the coefficients that define the game as well as the discount j3. Importantly, the Lipschitz 
constant in this result from [19] depends on the discount (3 (it is inversely proportional 
to 1 — /?). This fact was unfortunately overlooked in [HJ and, at a crucial point in their 
proofs, the Lipschitz constant was assumed to be a fixed constant that does not depend 
on p. This flaw unfortunately affects several results in [HJ. It also affects the results of [7J, 
since the later paper uses the reachability results of [HJ. As a consequence of this error, the 
best upper bound which currently follows from the results in [HJ [T^[ [7J is a PSPACE upper 
bound for the decision and approximation problems for the value of finite-state concurrent 
stochastic reachability games as well as for finite-state concurrent stochastic parity games. 
(See the erratum note for [HJ on K. Chatterjee's web page [9], as well as his Ph.D. thesis.) It 
is entirely plausible that these results can be repaired and that approximating the value of 
finite-state concurrent reachability games to within a given additive error e > can in the 
future be shown to be in NP n coNP, but the flaw in the proof given in [HJ is fundamental 
and does not appear to be easy to fix. 

On the other hand, for the quantitative decision problem for finite CSGs (as opposed to 
the approximation problem), and even the qualitative decision problem for 1-RCSGs, the 
situation is different. We show here that the quantitative decision problem for finite CSGs, 
as well as the qualitative decision problem for 1-RCSGs, are both as hard as the square-root 
sum problem, for which containment even in NP is a long standing open problem. Thus our 
PSPACE upper bounds here, even for the qualitative termination problem for 1-RCSGs, can 
not be improved to NP without a major breakthrough. Unlike for 1-RCSGs, the qualitative 
termination problem for finite CSGs is known to be decidable in P-time ([LT]). We note 
that in recent work Allender et. al. [I] have shown that the square-root sum problem is in 
(the 4th level of) the "Counting Hierarchy" CH, which is inside PSPACE, but it remains a 
major open problem to bring this complexity down to NP. 

The rest of the paper is organized as follows. In Section 2 we present the RCSG model, 
define the problems that we will study, and give some basic properties. In Section 3 we 
give a system of equations that characterizes the desired probabilities, and use them to 
show that the problems are in PSPACE. In Section 4 we prove the existence of optimal 
randomized stackless and memoryless strategies, and we present a strategy improvement 
method. Finally in Section 5 we present reductions from the square root sum problem to 
the quantitative termination problem for finite CSGs, and from the latter to the qualitative 
problem for Recursive CSGs. 

2. Basics 

We have two players, Player 1 and Player 2. Let T\ and T2 be finite sets constitut- 
ing the move alphabet of players 1 and 2, respectively. Formally, a Recursive Concur- 
rent Stochastic Game (RCSG) is a tuple A = (A\, . . . ,Ak), where each component A4 = 
(iVj, Bi,Yi, Era, .Exj, plj, <5j) consists of: 

(1) A finite set iVj of nodes, with a distinguished subset Eni of entry nodes and a (disjoint) 
subset Exi of exit nodes. 
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Figure 1: Example (1-exit) RCSG 



(2) A finite set Bi of boxes, and a mapping Y{ : Bi {1, . . . , fe} that assigns to every box 
(the index of) a component. To each box b G Bi, we associate a set of call ports, Calk = 
{(b,en) | en G Enytb)}, an d a set of return ports, Return^ = {(b, ex) \ ex G -Exy^)}. 
Let Ca/f = UbeBiCallb, Return 1 = UbeBtReturnb, and let Qi = NiU Call 1 U Return 1 
be the set of all nodes, call ports and return ports; we refer to these as the vertices of 
component A^. 

(3) A mapping plj : Qi t— > {0,pZaj/} that assigns to every vertex u a type describing how 
the next transition is chosen: if plj(u) = it is chosen probabilistically and if pl^u) = 
play it is determined by moves of the two players. Vertices u G {Exi U Calf) have no 
outgoing transitions; for them we let plj(«) = 0. 

(4) A transition relation Si C (Qj x (RU (ri x P^)) x Qj), where for each tuple (u, x, v) G <5j, 
the source n G (iVj \ E'xj) U Return 1 , the destination v G (iVj \ E'rij) U Ca/f , where if 
pl(tt) = then x is a real number p u ,v £ [0 5 1] (the transition probability), and if pl(ti) — 
play then x = (71,72) G Ti x T2. We assume that each vertex u G Qi has associated 
with it a set T" C Ti and a set C T2, which constitute player 1 and 2's legal moves at 
vertex u. Thus, if (u,x,v) G Si and x = (71,72) then (71,72) G x T^. Additionally, 
for each vertex u and each x G T" x r| , we assume there is exactly one transition of the 
form (u,x,v) in Si. Furthermore they must satisfy the consistency property: for every 
«€pl- 1 (0), E{v '\(u,p uv ,,v')e5i}Pv-,v' — 1) unless it is a call port or exit node, neither of 
which have outgoing transitions, in which case by default ^2 , p u y = 0. 

We use the symbols (TV, B, Q, S, etc.) without a subscript, to denote the union over all 
components. Thus, eg. N = U^ =l Ni is the set of all nodes of A, S = U* =1 Si the set of all 
transitions, Q = U* =1 Qj the set of all vertices, etc. The set Q of vertices is partitioned 
into the sets Q p i ay = 1 pl^ 1 (play) and Q pro b = pl _1 (0) of play and probabilistic vertices 
respectively. 

For computational purposes we assume that the transition probabilities p UjV are rational, 
given in the input as the ratio of two integers written in binary. The size of a RCSG is the 
space (in number of bits) needed to specify it fully, i.e., the nodes, boxes, and transitions 
of all components, including the probabilities of all the transitions. 
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Example 1. An example picture of a (1-exit) RCSG is depicted in Figured) This RCSG 
has one component, /, which has nodes {s, t, ui,U2,us, U4, u§}. It has one entry node, s, and 
one exit node, t. It also has two boxes, {b\, 62}) both of which map to the only component, 
/. All nodes in this RCSG are probabilistic (black nodes) except for nodes u\ and U4 which 
are player nodes (white nodes). The move alphabet for both players is {L,R} (for, say, 
"left" and "right"). At node u\ both players have both moves enabled. At node 114, player 
1 has only L enabled, and player 2 has both L and R enabled. □ 

An RCSG A defines a global denumerable stochastic game Ma = (V, A, pi) as follows. 
The global states V C B* x Q of Ma are pairs of the form {(3, u), where (3 G B* is a (possibly 
empty) sequence of boxes and u G Q is a vertex of A. More precisely, the states V C B* x Q 
and transitions A are defined inductively as follows: 

(1) (e, u) G V, for u G Q (e denotes the empty string.) 

(2) If {(3, u) G V and (u, x, v) G 5, then {(3, v) £ V and ((/?, u), x, {(3, v)) £ A. 

(3) If {(3, (b, en)) G V, with (b, en) G Call b , then {(3b, en) G V and ((/?, (b, en)), 1, {(3b, en)) G 
A. 

(4) If {(3b, ex) G V, and {b, ex) G Return b , then (6, ex)) G V and ((/?&, ex), 1, {(3, (b, ex))) G 
A. 

Item 1 corresponds to the possible initial states, item 2 corresponds to control staying 
within a component, item 3 is when a new component is entered via a box, item 4 is when 
control exits a box and returns to the calling component. The mapping pi : V \—* {0, play} 
is given by pl((/3, u)) = pl(it). The set of vertices V is partitioned into V pro b, Vpl a y, where 

Vprob = pl^H ) and V p lay = pl -1 (P /a 2/)- 

We consider Ma with various initial states of the form (e, u), denoting this by M\. 
Some states of Ma are terminating states and have no outgoing transitions. These are 
states (e, ex), where ex is an exit node. If we wish to view Ma as a non-terminating 
CSG, we can consider the terminating states as absorbing states of Ma, with a self-loop of 
probability 1. 

An RCSG where | = 1 (i- e -> where player 2 has only one action) is called a maxi- 
mizing Recursive Markov Decision Process (RMDP); likewise, when \T\\ = 1 the RCSG is 
a minimizing RMDP. An RSSG where \T\\ = \Y2\ = 1 is essentially a Recursive Markov 
Chain ([HI IS]). 

Our goal is to answer termination questions for RCSGs of the form: "Does player 1 
have a strategy to force the game to terminate (i.e., reach node {e,ex)), starting at {e,u), 
with probability > p, regardless of how player 2 plays?". 

First, some definitions: a strategy a for player i, i G {1, 2}, is a function a : V*Vpi a y l_ ► 
T>iTi), where T>(Ti) denotes the set of probability distributions on the finite set of moves Tj. 
In other words, given a history ws G V*Vpl a y, and a strategy a for, say, player 1, a{ws){'j) 
defines the probability with which player 1 will play move 7. Moreover, we require that 
the function o has the property that for any global state s = {(3,u), with pi (it) = play, 
a(ws) G T>(Tf). In other words, the distribution has support only over eligible moves at 
vertex u. 

Let denote the set of all strategies for player i. Given a history ws G V*Vpi a y of 
play so far, and given a strategy a G ^1 for player 1, and a strategy r G ^2 for player 2, the 
strategies determine a distribution on the next move of play to a new global state, namely, 
the transition (s, (71, 72), s') G A has probability cr(u>s)(7i) * t(ws)(72). This way, given 
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a start node u, a strategy a G VPi, and a strategy r G ^2, we define a new Markov chain 
(with initial state u) M^ a ' T = (5, A'). The states S C (e,-u)y* of M^ a ' T are non-empty 
sequences of states of Ma, which must begin with (e, -u). Inductively, if ws G 5, then: (0) 
if s G V^, r o& and (s,p s>s ',s') G A then wss' G 5 and (ws,p S:S /,wss') G A'; (1) if s G Vy\ a y, 
where (s, (71, 72), s') G A, then if cr(w;s)(7i) > and t(u;s)(72) > then wss' G 5 and 
(ws,p, wss') G A', where p = a(ws) (71) * t(ws)(72). 

Given initial vertex u, and final exit ex in the same component, and given strategies 
a G ^1 and r G ^21 for k > 0, let Q^^a be the probability that, in M^ ,<T,T , starting at initial 
state (e, u), we will reach a state w(e,ex) in at most A; "steps" (i.e., where \w\ < k). Let 
1*{u 'ex) = nm fc^oo ^uex) ^ ne probability of ever terminating at ex, i.e., reaching (e,ex). 
(Note, the limit exists: it is a monotonically non-decreasing sequence bounded by 1). Let 

lUex) = SU P<rS*i inf re*2 Q^el) and let = SU PaS*i in W 2 ^X)" For a strate gy 

€ let = inf re * 2 q k ^ y and let gJJ^ = inf re * 2 q*^ y Lastly, given a strategy 

r G * 2 , let q k ^ ex) = sup aei&1 gf^)' and let q*^ x) = sup^^ q*™ y 

From, general determinacy results (e.g., "Blackwell determinacy" [26J which applies to 
all Borel two-player zero-sum stochastic games with countable state spaces; see also |25j ) it 
follows that the games Ma are determined, meaning: 
sup (Te ^ i inf T<E * 2 q*£ e T x) = inf re * 2 sup,^ q*^ y 

We call a strategy a for either player a (randomized) Stackless and Memoryless {r-SM) 
strategy if it neither depends on the history of the game, nor on the current call stack. In 
other words, a r-SM strategy a for player i is given by a function a : Qpi a y > T?(Ti), which 
maps each play vertex u of the RCSG to a probability distribution cr(u) G T>(Tf) on the 
moves available to player i at vertex u. 

We are interested in the following computational problems. 

(1) The qualitative termination problem: Is q? uex \ = 1? 

(2) The quantitative termination (decision) problem: 
given r G [0, 1], is q* {u ex) > r? Is q* ( ^ ex) < r? 

The approximate version: approximate q^ u s to within desired precision. 

Obviously, the qualitative termination problem is a special case of the quantitative problem, 
setting r = 1. As mentioned, for multi-exit RCSGs these are all undecidable. Thus we focus 
on single-exit RCSGs (1-RCSGs), where every component has one exit. Since for 1-RCSGs 
it is always clear which exit we wish to terminate at starting at vertex u (there is only one 
exit in u's component), we abbreviate q* u exy q*^ ex y etc., as g^,g« CT , etc., and we likewise 
abbreviate other subscripts. 

A different "qualitative" problem is to ask whether q* = 0? As we will show in Propo- 
sition [33J this is an easy problem: deciding whether q* = for a vertex u in a 1-RCSG can 
be done in polynomial time, and only depends on the transition structure of the 1-RCSG, 
not on the specific probabilities. 

As mentioned in the introduction, it is known that for concurrent stochastic games, 
probabilistic nodes do not add any power, and can in effect be "simulated" by concurrent 
nodes alone (this fact was communicated to us by K. Chatterjee [6]). The same fact is true 
for 1-RCSGs. Specifically, the following holds: 

Proposition 2.1. There is a P-time reduction F, which, given a finite CSG (or a 1- 
RCSG), G, computes a finite concurrent game (or 1-RCG, respectively) F(G), without any 
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probabilistic vertices, such that the value of the game G is exactly the same as the value of 
the game F(G). 

Proof. First, suppose for now that in G all probabilistic transitions have probability 1/2. 
In other words, suppose that for a probabilistic vertex s G pl~ 1 (0) (which is not an exit or 
a call port) in an 1-RCSG, we have two transitions (s, 1/2, t) £ 5 and (s, 1/2, t') G 5. In the 
new game F(G), change s to a play vertex, i.e., let pl(s) = play, and let rf = T| = {a, b}, 
and replace the probabilistic transitions out of s with the following 4 transitions: (s, (a, b),t), 
(s, (b, a),t), (s,(a,a),t') and (s,(b,b),t'). Do this for all probabilistic vertices in G, thus 
obtaining F(G) which contains no probabilistic vertices. 

Now, consider any strategy a for player 1 in the original game G, and a strategy a' in 
the new game F(G) that is consistent with a, i.e. for each history ending at an original play 
vertex a' has the same distribution as a (and for the other histories ending at probabilistic 
vertices it has an arbitrary distribution). For any strategy r for player 2 in the game G, 
consider the strategy, F(t), for player 2 in F(G), which is defined as follows: whenever the 
play reaches a probabilistic vertex s of G (in any context and with any history) F(t) plays 
a and b with 1/2 probability each. At all non-probabilistic vertices of G, F(r) plays exactly 
as r (and it may use the history, etc.). This way, no matter what player 1 does, whenever 
play reaches the vertex s (in any context) the play will move from s to t and to t 1 with 
probability 1/2 each. Thus for any vertex u, the value qu' a ' T in the game G is the same 

as the value q*a° in the game F(G). So the optimal payoff value for player 1 in the 
game starting at any vertex u is not greater in F(G) than in G. A completely symmetric 
argument shows that for player 2 the optimal payoff value starting at u is not greater in 
F(G) than in G. Thus, the value of the game starting at u is the same in both games. 

We can now generalize this to arbitrary rational probabilities on transitions, instead of 
just probability 1/2, by using a basic trick to encode arbitrary finite probability distributions 
using a polynomial-sized finite Markov chain all of whose transitions have probability 1/2. 
Namely, suppose u goes to v\ with probability p/q and to v<i with probability 1—p/q, where 
p,q are integers with k bits (we can write both as fc-bit numbers, by adding leading O's to p 
if necessary so that it has length exactly k, same as q). Flip (at most) k coins. View this as 
generating a k bit binary number. If the number that comes out is < p (i.e. 0, . . . , p — 1), 
then go to v\, if between p and q (i.e., p, . . . ,q — 1) then go to V2, if > q go back to the 
start, u. A naive way to do this would require exponentially many states in k. But we only 
need at most 2k states to encode this if we don't necessarily flip all k coins but rather do 
the transition to v±,V2 or u, as soon as the outcome is clear from the coin flips. That is, 
if the sequence a formed by the initial sequence of coin flips so far differs from both the 
prefixes p', q' of p and q of the same length, then we do the transition: if a < p' transition 
to v\, if p' < a < q' transition to V2, and if a > q' then transition to u. Thus, we only need 
to remember the number j of coins flipped so far, and if j is greater than the length of the 
common prefix of p and q then we need to remember also whether the coin flips so far agree 
with p or with q. 

Clearly, a simple generalization of this argument works for generating arbitrary finite 
rational probability distributions Pi/q,P2/q, ■ ■ ■ ,Pr/q, such that YH=i(Pi/o) = 1- If 9 is a 
k-bit integer, then the number of new states needed is at most rk, i.e. linear in the encoding 
length of the rationals pi/q, . . . ,p r /q- □ 
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3. Nonlinear minimax equations for 1-RCSGs 

In (pS]) we defined a monotone system Sa of nonlinear min-& -max equations for 1-RSSGs 
(i.e. the case of simple games), and showed that its least fixed point solution yields the 
desired probabilities g*. Here we generalize these to nonlinear minimax systems for con- 
current games, 1-RCSGs. Let us use a variable x u for each unknown q*, and let x be the 
vector of all x u , u G Q. The system Sa has one equation of the form x u = -P«(x) for each 
vertex u. Suppose that u is in component Ai with (unique) exit ex. There are 4 cases based 
on the "Type" of u. 

(1) u G Type\\ u = ex. In this case: x u = 1. 

(2) u G Type ranc i. pl(it) = and u G (iVj \ {ex}) U Return 1 . Then the equation is x u = 
J2{v\(u p u v v)£8} Pu,v%v (If u has no outgoing transitions, this equation is by definition 
x u = 0.) 

(3) u G Type ca u\ u = (6, en) is a call port. The equation is x^.en) = x en • ^(6,ear')) where 
ex' G Ex Y (b) i s the unique exit of Aw 6 -j. 

(4) -u G Typepi a y. Then the equation is x u = V&l(A u (x)), where the right-hand side is 
defined as follows. Given a value vector x, and a play vertex u, consider the zero- 
sum matrix game given by matrix A u (x), whose rows are indexed by player l's moves 
Tf from node u, and whose columns are indexed by player 2's moves T^. The payoff 
to player 1 under the pair of deterministic moves 71 G T", and 72 G T^, is given by 
(^4 u (x)) 7l)72 := x v , where (u, (71,72),^) G 5. Let V&l(A u (x)) be the value of this zero- 
sum matrix game. By von Neumann's minimax theorem, the value and optimal mixed 
strategies exist, and they can be obtained by solving a Linear Program with coefficients 
given by the Xj's. 

In vector notation, we denote the system Sa by x = P(x). Given 1-exit RCSG A, we can 
easily construct this system. Note that the operator P : M™ M> is monotone: for 
x, y G M>0) if x < V then P(x) < P(y). This follows because for two game matrices A and 
B of the same dimensions, if A < B (i.e., Ai j < Bij for all i and j), then Val(A) < Val(S). 
Note that by definition of A u (x), for x < y, A u (x) < A u (y). 

Example 3.1. We now construct the system of nonlinear minimax functional equations, 
x = P(x), associated with the 1-RCSG we encountered in Figure [1] (see Example [T]). We 
shall need one variable for every vertex of that 1-RCSG, to represent the value of the 
termination game starting at that vertex, and we will need one equation for each such 
variable. Thus, the variables we need are x s , xt, x Ul , . . . , x U5 , s )j x (6i,t)> x (b 2 ,s)i x (b 2 ,t)- 
The equations are as follows: 



x t = l 

x 



X {b l ,t) - X (b 2 ,s) 

(l/2)x {bus) + (l/4)x t + (l/4)x Ul X[b2s) = XsX(b2A 



x u 5 - x u 5 x (b 2 ,t) = x t 

x u 2 = x (b 2 ,s) ^ _ ^ 

x u , = {l/2)x U2 + (l/2)x t 



X U3 



x (bl!s) = x s x (but) x u 4 = Val ( [ x (b2>s) x t ] ) 

We now identify a particular solution to x = P(x), called the Least Fixed Point (LFP) 
solution, which gives precisely the termination game values. Define P 1 (x) = P(x), and 
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define P k {x) = P(P k for k > 1. Let g* G M n denote the n-vector q*,u e Q (using 

the same indexing as used for x). For k > 0, let g fc denote, similarly, the n-vector q k , u <E Q. 

Theorem 3.2. Let x = P(x) be the system Sa associated with 1-RCSG A. Then q* = 
P{q*), and for all q' G K> , if q' = P(q')> then q* < q' (i.e., q* is the Least Fixed Point, of 
P : M> i ^ K> j- Moreover, lim^^ P k (0) | q* , i.e., the "value iteration" sequence P k (0) 
converges monotonically to the LFP, q* . 

Proof. We first prove that q* = P(q*). Suppose q* / P(q*)- The equations for vertices 
u of types Typei,Type ran d, and Type ca u can be used to define precisely the values g* in 
terms of other values q*. Thus, the only possibility is that q* ^ P u {q*) for some vertex u 
of Typepi a y. In other words, q* / Val(A u (q*)) . 

Suppose q* < Val(A u (q*)). To see that this can't happen, we construct a strategy a for 
player 1 that achieves better. At node u, let player l's strategy a play in one step its optimal 
randomized minimax strategy in the game A u (q*) (which exists according to the minimax 
theorem). Choose e > such that e < V&l(A u (q*)) — g*. After the first step, at any vertex v 
player l's strategy a will play in such a way that achieves a value > q* — e (i.e, an e-optimal 
strategy in the rest of the game, which must exist because the game is determined). Let 
e be an n-vector every entry of which is e. Now, the matrix game A u (q* — e) is just an 
additive translation of the matrix game A u (q*), and thus it has precisely the same e-optimal 
strategies as the matrix game A u (q*), and moreover V&l(A u (q* — e)) = Val(^4 u (g*)) — e. 
Thus, by playing strategy a, player 1 guarantees a value which is > Y&l(A u (q* — e)) = 
Y&\{A u (q*)) — e > g*, which is a contradiction. Thus g* > V&l(A u (q*)). 

A completely analogous argument works for player 2, and shows that g* < Val(A u (q*)). 
Thus q* u = Val(A u (g*)), and hence g* = P(q*). 

Next, we prove that if g' is any vector such that g' = P(q'), then g* < g'. Let t' be 
the randomized stackless and memoryless strategy for player 2 that always picks, at any 
state (P, u), for play vertex u G Qpi av , a mixed 1-step strategy which is an optimal strategy 
in the matrix game A u (q'). (Again, the existence of such a strategy is guaranteed by the 
minimax theorem.) 

Lemma 3.3. For all strategies a G *i of player 1, and for all k > 0, q k > a > T ' < q' . 
Proof. By induction. The base case q°< a < T ' < q' is trivial. 

(1) Type±. If u = ex is an exit, then for all k > 0, clearly qex' T = q' ex = 1- 

(2) Type ran d- Let a' be the strategy defined by a' ((5) = a((e,u)/3) for all (3 G V*. Then, 

k+l,a,r> _ V- k,a>,T> <y ' _ ' 

qu — / y Pu,v q v — / y Pu,v q v — qu- 

V V 

(3) Type ca ii. In this case, u = (6, en) G Call b , and g^ +1 ' CT ' T < sup p geA p,T ■ sup p q\p ^ , 
where ex' G Exy(p) is the unique exit node of A Y (b)- Now, by the inductive assumption, 
qk,p,r' < qi f or & n p Moreover, since q' = P(q'), q u = q' en ■ q'r bex /y Hence, using these 
inequalities and substituting, we get 

q u ^ q en q(b,ex') — Qu- 

(4) Type p i ay : In this case, starting at (e, u), whatever player l's strategy a is, it has the 

property that g^ +1 ' cr ' T < V&l(A u (q k,a ' ' T ')). By the inductive hypothesis g^' CT ' T < q' v , so 
we are done by induction and by the monotonicity of Val(A u (x)). □ 
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Now, by the lemma, q*' a ' T = lim^oo q k ' CT ' T < q' . This holds for any strategy a G ^>\. 
Therefore, sup (7g ^, 1 q u ' a ' T < q' u , for every vertex u. Thus, by the determinacy of RCSG 
games, we have established that g* = inf rG ^ 2 sup (Tg ^ 1 g« CT,T < q' u , for all vertices u. In other 
words, q* < q' . The fact that lim^oo P k (0) f q* follows from a simple Tarski-Knaster 
argument. □ 

Example 2. For the system of equations x = P(x) given in Example \3.1\ associated with 
the 1-RCSG given in Example [TJ fairly easy calculations using the equations show that 
the Least Fixed Point of the system (and thus the game values, starting at the different 
vertices) is as follows: q* t = ?^ t) = 1; q* U5 = 0; q* s = q* Ul = q* U2 = q* Ui = q^ t) = q*^ fl) = 0.5; 
q* 3 = 0.75; and g^ s) = 0.25. 

In this case the values turn out to be rational and are simple to compute, but in general 
the values may be irrational and difficult to compute, and even if they are rational they 
may require exponentially many bits to represent (in standard notation, e.g., via reduced 
numerator and denominator given in binary) in terms of the size of the input 1-RCSG or 
equation system. 

Furthermore, in this game there are pure optimal (stackless and memoryless) strategies 
for both players. Specifically, the strategy for player 1 (maximizer) that always plays L 
from nodes u\ is optimal, and the strategy for player 2 that always player L from nodes u\ 
and U4 is optimal. In general for 1-RCSGs, we show randomized stackless and memoryless 
e-optimal and optimal strategies do exist for players 1 and 2, respectively. However, for 
player 1 only e-optimal strategies may exist, and although optimal strategies do exist for 
player 2 they may require randomization using irrational probabilities. This is the case even 
for finite-state concurrent games. □ 

We can use the system of equations to establish the following upper bound for computing 
the value of a 1-RCSG termination game: 

Theorem 3.4. The qualitative and quantitative termination problems for 1-exit RCSGs can 
be solved in PSPACE. That is, given a 1-exit RCSG A, vertex u and a rational probability 
p, there is a PSPACE algorithm to decide whether q* < p (or q* > p, or q* < p, etc.). 
The running time is 0(|^4|°( n )) where n is the number of variables in x = -P(x). We can 
also approximate the vector q* of values to within a specified number of bits i of precision 
(i given in unary), in PSPACE and in time 0(i\A\°^). 

Proof. Using the system x = P(x), we can express the condition g* < c by a sentence in 
the existential theory of the reals as follows: 

n n 

3xi, ■ ■ ■ ,x n f\(xi = Pi{xi, . . . ,x n )) A f\(xi > 0) A (x u < c) 

i=l i=l 

Note that the sentence is true, i.e. there exists a vector x that satisfies the constraints of 
the above sentence if and only if the least fixed point q* satisfies them. The constraints 
Xi = Pi{x\, . . . , x n ) for vertices i of type 1, 2, and 3 (exit, probabilistic vertex and call port) 
are clearly polynomial equations, as they should be in a sentence of the existential theory 
of the reals. We only need to show how to express equations of the form x v = V&l(A v (x)) 
in the existential theory of reals. We can then appeal to well known results for deciding 
that theory ([U[27]). But this is a standard fact in game theory (see, e.g., [2j [T9J, [T2] where 
it is used for finite CSGs). The minimax theorem and its LP encoding allow the predicate 
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"y = Val(A v (x))" to be expressed as an existential formula <p(y,x) in the theory of reals 
with free variables y and x\, . . . , x n , such that for every x G W l , there exists a unique y (the 
game value) satisfying <p(y,x). Specifically, the formula includes, besides the free variables 
x, y, existentially quantified variables £ 7l ,7i G T\, and u; 72 ,72 G for the probabilities 
of the moves of the two players, and the conjunction of the following constraints (recall 
that each entry ^(71,72) of the matrix A u is a variable x v where v is the vertex such that 
(u, (71,72),^) G ^) 

z 7l > for all 71 G Tf ; E 7l er- z 7i = ^ 

u; 72 > for all 72 G T v 2 ; E 72 er^ *% 2 = 1; 

E 7lG r« At (71 > 72)^71 ^ 2/ for a11 72 G T^; 

E 72G r« ^(71,72)^72 ^ 2/ for all 71 G rf. 

To approximate the vector of game values within given precision we can do binary search 
using queries of the form q* < c for all vertices u. □ 

Determining the vertices u for which the value q* is 0, is easier and can be done in 
polynomial time, as in the case of the turn-based 1-RSSGs |17j . 

Proposition 3.5. Given a 1-RCSG we can compute in polynomial time the set Z of vertices 
u such that q* = 0. This set Z depends only on the structure of the given 1-RCSG and not 
on the actual values of the transition probabilities. 

Proof. Prom the system of fixed point equations we have the following: (1) all exit nodes 
are not in Z; (2) a probabilistic node u is in Z if and only if all its (immediate) successors 
v are in Z\ (3) the call port u = (b, en) of a box b is in Z if and only if the entry node en 
of the corresponding component Y(b) is in Z or the return port (b, ex) is in Z; (4) a play 
node u is in Z if and only if Player 2 has a move 72 G T% such that for all moves 71 G 
of Player 1, the next node v, i.e. the (unique) node v such that (u, (71,72), v) G 5, is in Z. 

Only the last case of a play node u needs an explanation. If Player 2 has such a move 
72, then clearly the corresponding column of the game matrix A u (q*) has all the entries 
0, and the value of the game (i.e., <?*) is 0. Conversely, if every column of ^4 u (q*) has a 
nonzero entry, then the value of the game with this matrix is positive because for example 
Player 1 can give equal probability to all his moves. Thus, in effect, as far as computing 
the vertices with zero value is concerned, we can fix the strategy of Player 1 at each play 
vertex to play at all times all legal moves with equal probability to get a 1-RMDP; a vertex 
has nonzero value in the given 1-RCSG iff it has nonzero value in the 1-RMDP. 

The algorithm to compute the set Z of vertices with value is similar to the case of 
1-RSSGs [17]. Initialize Z to Q\Ex, the set of non-exit vertices. Repeat the following until 
there is no change: 

• If there is a probabilistic node u G Z that has a successor not in Z, then remove u from 
Z. 

• If there is a call port u = (b, en) G Z such that both the entry node en of the corresponding 
component Y(b) and the return port (6, ex) of the box are not in Z, then remove u from 
Z. 

• If there is a play node u G Z such that for every move 72 G of Player 2 there is a move 
71 G r" of Player 1 such that the next node v from u under (71,72) is not in Z, then 
remove u from Z. 

There are at most n iterations and at the end Z is the set of vertices u such that = 0. □ 
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4. Strategy improvement and randomized-SM-determinacy 
The proof of Theorem 1 implies the following: 

Corollary 4.1. In every 1-RCSG termination game, player 2 (the minimizer) has an op- 
timal r-SM strategy. 

Proof. Consider the strategy r' in the proof of Theorem 13.21 chosen not for just any fixed 
point q', but for q* itself. That strategy is r-SM and is optimal. □ 

Player 1 does not have optimal r-SM strategies, not even in finite concurrent stochastic 
games (see, e.g., |19t I12j). We next establish that it does have finite r-SM e-optimal strate- 
gies, meaning that it has, for every e > 0, a r-SM strategy that guarantees a value of at 
least q* — e, starting from every vertex u in the termination game. We say that a game is 
r-SM- determined if, letting ^ and *$>' 2 denote the set of r-SM strategies for players 1 and 
2, respectively, we have sup CTe ^ hif re ^ q£' <7 ' T = inf re ^ sup CTg ^/ a« CT ' r . 

Theorem 4.2. 

(1) (Strategy Improvement) Starting at any r-SM strategy do for player 1, via local strat- 
egy improvement steps at individual vertices, we can derive a series of r-SM strategies 
<to, 01,02, • • •> such that for all e > 0, there exists i > such that for all j > i, o~j is an 
e-optimal strategy for player 1 starting at any vertex, i.e., q u ' aj > q* — e for all vertices 
u. 

Each strategy improvement step involves solving the quantitative termination problem 
for a corresponding 1-RMDP. Thus, for classes where this problem is known to be in 
P-time (such as linearly-recursive 1-RMDPs, |16j ). strategy improvement steps can be 
carried out in polynomial time. 

(2) Player 1 has e-optimal r-SM strategies, for all e > 0, in 1-RCSG termination games. 

(3) 1-RCSG termination games are r-SM- determined. 

Proof. Note that (2.) follows immediately from (1.), and (3.) follows because by Corollary 
14.11 player 2 has an optimal r-SM strategy and thus 

Let be any r-SM strategy for player 1. Consider q*' a . First, let us note that if q* ,(T = 
P(q*' a ) then q*' a = q*. This is so because, by Theorem l3.2l q* < q* ,a , and on the other hand, 
is just one strategy for player 1, and for every vertex u, q* = sup cr / e ^, 1 inf r6 ^ 2 qu' a ' T > 
im T e*2 Qu = q-u ■ 

Next we claim that, for all vertices u Type^^y, qu' a satisfies its equation in x = P{x). 
In other words, qu° = P u (q* ,a ). To see this, note that for vertices u $ Type^y, no choice 
of either player is involved, thus the equation holds by definition of q* ,a . Thus, the only 
equations that may fail are those for u G Type p i ay , of the form x u = Val(A u (x)). We need 
the following. 

Lemma 4.3. For any r-SM strategy a for player 1, and for any u G Type p i ay , qu' a < 
Val(A u (q*>°)). 

Proof. We are claiming that q u ' — inf r g\|/ 2 q u ' ' ' < Val(A u (q*' a )). The inequality follows 
because a strategy for player 2 can in the first step starting at vertex u play its optimal 
strategy in the matrix game A u (q*' a ), and thereafter, depending on which vertex v is the 
immediate successor of u in the play, the strategy can play "optimally" to force at most the 
value qv' a ■ □ 
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Now, suppose that for some u G Type p i ay , qu' a ^ Val{A u (q* ,a )). Thus by the lemma 
Qu ,(7 < Val(A u (q*' r7 )). Consider a revised r-SM strategy for player 1, a', which is identical 
to a, except that locally at vertex u the strategy is changed so that a'(u) = p*> n > CT , where 
p*' u ' a £ D(Tf) is an optimal mixed minimax strategy for player 1 in the matrix game 
A u (q*' u ). We will show that switching from a to a' will improve player l's payoff at vertex 
u, and will not reduce its payoff at any other vertex. 

Consider a parameterized 1-RCSG, A(t), which is identical to A, except that u is a 
randomizing vertex, all edges out of vertex u are removed, and replaced by a single edge 
labeled by probability variable t to the exit of the same component, and an edge with 
remaining probability 1 — t to a dead vertex. Fixing the value t determines an 1-RCSG, 
A{t). Note that if we restrict the r-SM strategies a or a' to all vertices other than u, 
then they both define the same r-SM strategy for the 1-RCSG A(t). For each vertex z 
and strategy r of player 2, define qt' a ' T ' to be the probability of eventually terminating 
starting from (e,z) in the Markov chain M Z J^ . Let f z (t) = inf re ^, 2 q* z a,T,t . Recall that 
<j'{u) = p*' u ' a £ D(T^) defines a probability distribution on the actions available to player 1 
at vertex u. Thus p*' u ' a (ji) is the probability of action 71 G T%. Let 72 G T2 be any action 
of player 2 for the 1-step zero-sum game with game matrix A u {q* ,a ). Let 1^(71,72) denote 
the vertex such that (u, (71, 72), 10(71, 72)) G 5. Let h 72 (t) = E 7ie ri P*' u ' a Mfw^^)^)- 

Lemma 4.4. Fix the vertex u. Let ip : M 1— > R be any function <p G {f z \ z G Q} U {h 7 \ 7 G 
r^}. The following properties hold: 

(1) Ifip{t) > t at some point t G [0, 1], then tp(t') > t' for all < t' < t. 

(2) Ifip(t) <t at some point t G [0, 1], then ip(t') < t' for all 1 > t' > t. 

Proof. First, we prove this for ip = f z , for some vertex z. 

Note that, once player 1 picks a r-SM strategy, a 1-RCSG becomes a 1-RMDP. By a 
result of [16] . player 2 has an optimal deterministic SM response strategy. Furthermore, 
there is such a strategy that is optimal regardless of the starting vertex. Thus, for any value 
of t, player 2 has an optimal deterministic SM strategy Tt, such that for any start vertex z, 
we have Tt = argmin Te ^, 2 g*' <J,T ' t . Let <?(^ T )(i) = qt' (7,T ' t , and let d$>2 be the (finite) set of 
deterministic SM strategies of player 2. Then f z {t) = min ref j^ 2 g z ^ T (t). Now, note that the 
function g Z}T (t) is the probability of reaching an exit in an RMC starting from a particular 
vertex. Thus, by [14], g z , T {t) = (lim^oo R k (0)) z for a polynomial system x = i?(x) with 
non-negative coefficients, but with the additional feature that the variable t appears as 
one of the coefficients. Since this limit can be described by a power series in the variable 
t with non-negative coefficients, g Z)T {t) has the following properties: it is a continuous, 
differentiable, and non-decreasing function of t G [0, 1], with continuous and non-decreasing 
derivative, g' ZT (t), and since the limit defines probabilities we also know that for t G [0, 1], 
g z , T (t) G [0, 1]'. Thus g ZjT (0) > and g ZjT (l) < 1. 

Hence, since g' ZT {t) is non-decreasing, if for some t G [0,1], g Z)T (t) > t, then for all 
t' < t, g Z T (t') > t' . To see this, note that if g z . T {t) > t and g' z r (t) > 1, then for all t" > t, 
9 z ,r(t") > t", which contradicts the fact that g z , T (l) = 1. Thus g' ZT {t) < 1, and since g' ZT 
is non-decreasing, it follows that g' ZT (t') < 1 for all t' < t. Since g z . T {t) > t, we also have 
9z,r(t') > t' for all t' < t. 

Similarly, if g Z)T {t) < t for some t, then g Z T (t") < t" for all t" G [t, 1). To see this, note 
that if for some t" > t, t" < 1, g ZjT (t") = t" , then since g' z T is non-decreasing and g z , T {t) < t, 
it must be the case that g' ZT {t") > 1. But then g ZtT (l) > 1, which is a contradiction. 
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It follows that f z (t) has the same properties, namely: if fz(t) > t at some point t G [0, 1] 
then g XT (t) > t for all r, and hence for all t' < t and for all r G d*$>2, 5z,r(0 > £'j an d thus 
/s(i') > i' for all t' G [0,t]. On the other hand, if < t at t G [0, 1], then there must 
be some r' G d^2 such that g Z:T >(t) < t- Hence g Z:T >(t") < t", for all t" G [i, 1), and hence 
f z (t") < t" for all t" G [t 3 1). 

Next we prove the lemma for every ip = /i 7 , where 7 G T^. For every value of t, there is 
one SM strategy Tt of player 2 (depending only on t) that minimizes simultaneously g ZT (t) 
for all nodes z. So h 7 (t) = min r r 7)T (i), where r 7iT (i) = P*'"' <t (7i)<7w(7i,7),t(*) i s a 

convex combination (i.e., a "weighted average") of some g functions at the same point t. 
The function r 7)T (for any subscript ) inherits the same properties as the g's: continuous, 
differentiable, non-decreasing, with continuous non-decreasing derivatives, and r 7)T takes 
value between and 1. As we argued for the g functions, in the same way it follows that 
r 7)T has properties 1 and 2. Also, as we argued for f's based on the g's, it follows that fa's 
also have the same properties, based on the r's. □ 

Now let t\ = qu a , and let t% = Va\(A u (q*' a )). By assumption, ti > t\. Observe 
that f z (h) = q* z ' u for every vertex z. Thus, /i 72 (ti) = E 7lG r! P*'"' fT (7i)/»(7i,7 2 )( t i) = 
w'(7i,72)' smce ' ^ definition, p*< u > a is an optimal strategy for player 1 

in the matrix game A u (q*' a ), it must be the case that for every 72 G Y^, h 72 (tx) > £2, for 
otherwise player 2 could play a strategy against p*> u < a which would force a payoff lower than 
the value of the game. Thus /i 72 (ii) > t-2 > ti, for all 72. This implies that h r2 (t) > t for all 
t < t\ by Lemma 2, and for all t\ < t < £2, because h 12 is non-decreasing. Thus, h<y 2 (t) > t 
for all t < t2- 

Let £3 = qu' a ■ Let r' be an optimal global strategy for player 2 against a'; by [16], 
we may assume r' is a deterministic SM strategy. Let 7' be player 2's action in r 1 at node 
u. Then the value of any node z under the pair of strategies a' and r' is fzih), and thus 
since hy(t^) is a weighted average of / z (t3)'s for some set of z's, we have hy(t^) = t%. 
Thus, by the previous paragraph, it must be that £3 > t2, and we know ti > t±. Thus, 
t 3 = qT' > Yal(A u (q*> a )) > h = qZ'° . We have shown: 

Lemma 4.5. q*^' > Val(A u (q*> a )) > qt'° . 

Note that since t$> t\, and f z is non-decreasing, we have f z {tz) > f z (ti) for all vertices 
z. But then q* z a = f z (t%) > f z {t\) = q* z a for all z. Thus, q*' a ' > q*' a , with strict inequality 
at u, i.e., qu' a > qu' a ■ Thus, we have established that such a "strategy improvement" step 
does yield a strictly better payoff for player 1. 

Suppose we conduct this "strategy improvement" step repeatedly, starting at an arbi- 
trary initial r-SM strategy do, as long as we can. This leads to a (possibly infinite) sequence 
of r-SM strategies 00, a%, <r 2 , ■ ■ ■■ Suppose moreover, that during these improvement steps 
we always "prioritize" among vertices at which to improve so that, among all those vertices 
u G Type p i ay which can be improved, i.e., such that q*u° % < Val(A u (q*' ai )), we choose the 
vertex which has not been improved for the longest number of steps (or one that has never 
been improved yet). This insures that, infinitely often, at every vertex at which the local 
strategy can be improved, it eventually is improved. 

Under this strategy improvement regime, we show that lim^oo q*> ai = q* , and thus, for 
all e > 0, there exists a sufficiently large i > such that Oi is an e-optimal r-SM strategy for 
player 1. Note that after every strategy improvement step, i, which improves at a vertex 
u, by Lemma 14.51 we will have q u ' ai+1 > V&l(A u (q* ,CTi )). Since our prioritization assures 
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that every vertex that can be improved at any step i will be improved eventually, for all 
i > there exists k > such that q* ,a * < P(q*' Ui ) < g*>°"*+fc . I n fact, there is a uniform 
bound on k, namely k < \Q\, the number of vertices. This "sandwiching" property allows 
us to conclude that, in the limit, this sequence reaches a fixed point of x = P(x). Note 
that since q* ,r7i < q*^^ 1 for all i, and since q* ,tTi <q*,we know that the limit lim^oo q* ,<Ti 
exists. Letting this limit be q', we have q' < q*. Finally, we have q' = P(q'), because 
letting i go to infinity in all three parts of the "sandwiching" inequalities above, we get 
q' < limi-^oo P(q*> ai ) < q' . But note that lim^oo P(q*' ai ) = P(q'), because the mapping 
P(x) is continuous on M> . Thus q' is a fixed point of x = P(x), and q' < q*. But since q* 
is the least fixed point of x = P(x), we have q' = q* . □ 

We have so far not addressed the complexity of computing or approximating the (e-) 
optimal strategies for the two players in 1-RCSG termination games. Of course, in general, 
player 1 (maximizer) need not have any optimal strategies, so it only makes sense to speak 
about computing e-optimal strategies for it. Moreover, the optimal strategies for player 2 
may require randomization that is given by irrational probability distributions over moves, 
and thus we can not compute them exactly, so again we must be content to approximate 
them or answer decision questions about them. It is not hard to see however, by examining 
the proofs of our theorems, that such decision questions can be answered using queries to 
the existential theory of reals, and are thus also in PSPACE. 



Recall the square-root sum problem (e.g., from [201 114j ): given (ax, . . . ,a n ) € N n and k £ N, 
decide whether Ya=1 — ^- 



Theorem 5.1. There is a P-time reduction from the square-root sum problem to the quan- 
titative termination (decision) problem for finite CSGs. 

Proof. Given positive integers (ax, . . . , a n ) £ N n , and k € N, we would like to check whether 
Y27=i — ^- We can clearly assume that cij > 1 for all i. We will reduce this problem to 
the problem of deciding whether for a given finite CSG, starting at a given node, the value 
of the termination game is greater than a given rational value. 

Given a positive integer a > 1, we will construct a finite CSG, call it gadget G(a), with 
the property that for a certain node u in G(a) the value of the termination game starting 
at u is d + ey/a, where d and e are rationals that depend on a, with e > 0, and such that 
we can compute d and e efficiently, in polynomial time, given a. 

If we can construct such gadgets, then we can do the reduction as follows. Given 
(ax,... , a n ) G N n , with ai > 1 for all i, and given k £ N, make copies of the gadgets G(ax), 
. . . , G(a n ). In each gadget G(ai) we have a node Ui whose termination value is di + eiy/ai, 
where di and et > are rationals that depend on a« and can be computed efficiently from 
Oj. Create a new node s and add transitions from s to the nodes Ui, i = 1 . . . ,n, with 
probabilities pi = E/ei, respectively, where E = h)- ^ is easy to check that the 

value of termination starting at s is D + i^X^Li sf^ii where D = Y17=lPidi- Note that D 
and E are rational values that we can compute efficiently given the a^s, so to solve the 
square root sum problem, i.e., decide whether Ya=1 — ^> we can as ^ whether the value 
of the termination game starting at node s is > D + Ek. 

Now we show how to construct the gadget G(a) given a positive integer a. G(a) has a 
play node u, the target node t, dead node z, and probabilistic nodes vx, i>2- Nodes z and t 
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are absorbing. At u each player has two moves {1, 2}. If they play 1, 1 then u goes to v\, if 
they play 2, 2 then u goes to V2, if they play 1,2 or 2, 1 then u goes to z. 

Note that we can write a as a = m 2 — I where m is a small-size rational (m is approx- 
imately y/a) and I < 1 is also a small-size rational, and such that we can compute both 
m and I efficiently given a. To see this note that, first, given a we can easily approximate 
y/a from above to within an additive error at most l/(2a) in polynomial time, using stan- 
dard methods for approximating square roots. In other words, given integer a > 1, we can 
efficiently compute a rational number m such that < m — y/a < l/(2a). We then have 

m 2 < ( v / a + l/(2a)) 2 

= a + l/v^+l/(4a 2 ) 

Since l/y/a + l/(4a 2 ) < 1, we can let I = m 2 — a. 

Having computed m and I, let C2 = Z/4, g = m — 1 — C2, and c\ = gc 3 , where < c 3 < 1 
is a small-sized rational value such that c 3 < l/2g. From node v\ we move with probability 
c\ to t, with probability C2 to u, and with the remaining probability to z. From node vi we 
go with probability C3 to t and 1 — C3 to z. It is not hard to check that these are legitimate 
probabilities. 

Let x be the value at u. We have x = Val(yl), where the 2x2 matrix A for the one-shot 
zero-sum matrix game at u has = c\ + C2X, ^2 = C3, and A1.2 = ^2,1 = 0. Note that 
A\ i > and ^2,2 > 0. If the optimal strategy of player 1 at u is to play 1 with probability 
p and 2 with probability 1 — p, then by basic facts about zero-sum matrix games we must 
have < p < 1 and x = p(c\ + C2X) = (1 — p)c%. So p = 03/(01 + C2X + C3), and substituting 
this expression for p in the equality x = p{c\ + C2x), we have: 

C2X 2 + (gc 3 + c 3 - c 2 c 3 )x - g(c 3 ) 2 = 

So, 

_ -(9C3 + c 3 - c 2 c 3 ) + yj (gc 3 + c 3 - c 2 c 3 ) 2 + Agc 2 {c 3 ) 2 



2c 2 

Note that we must choose the root with + sign to get a positive value. 

The discriminant can be written as (c 3 ) 2 [(g + 1 — C2) 2 + 4gc2]- The term (C3) 2 will come 
out from under the square root, as c 3 , so we care only about the expression in the brackets, 
which is 

{g + I - C2) 2 + Agc 2 = (g + l) 2 + (c 2 ) 2 - 2gc 2 - 2c 2 + 4gc 2 
= (g + l) 2 + (c 2 ) 2 + 2 9 c 2 + 2c 2 - 4c 2 
= (g + 1 + c 2 ) 2 - 4c 2 
= m 2 — I 
= a 

So x = d + ey/a, where d = —{gc 3 + c 3 — C2C 3 )/2c2 and e = c 3 /2c2- □ 

Theorem 5.2. There is a P-time reduction from the quantitative termination (decision) 
problem for finite CSGs to the qualitative termination problem for 1-RCSGs. 

Proof. Consider the 1-RMC depicted in Figure [2j We assume pi +P2 = 1- As shown in 
(|14j. Theorem 3), in this 1-RMC the probability of termination starting at (e, en) is = 1 if 
and only if p 2 > 1/2. 
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Figure 2: 1-RMC A' 

Now, given a finite CSG, G, and a vertex u of G, do the following: first "clean up" G 
by removing all nodes where the min player (player 2) has a strategy to achieve probability 

0. We can do this in polynomial time as follows. Note that the only way player 2 can force 
a probability of termination is if it has a strategy r such that, for all strategies a of player 

1, there is no path in the resulting Markov chain from the start vertex u to the terminal 
node. But this can only happen if, ignoring probabilities, player 2 can play in such a way 
as to avoid the terminal vertex. This can be checked easily in polynomial time. 

The revised CSG will have two designated terminal nodes, the old terminal node, labeled 
"1" , and another terminal node labeled "0" . From every node v of Type ran d in the revised 
CSG which does not carry full probability on its outedges, we direct all the "residual" 
probability to "0" , i.e., we add an edge from v to "0" with probability p v «q» = 1 — Y2 W Pv,w, 
where the sum is over all remaining nodes w is the CSG. 

Let e > be a value that is strictly less than the least probability, over all vertices, 
under any strategy for player 2, of reaching the terminal node. Obviously such an e > 
exists in the revised CSG, because by Corollary 14.11 (specialized to the case of finite CSGs) 
player 2 has an optimal randomized S&M strategy. Fixing that strategy r, player 1 can 
force termination from vertex u with positive probability q u ' ,T . We take e = (m.in u q u ' ' T )/2. 
(We do not need to compute e; we only need its existence for the correctness proof of the 
reduction.) 

In the resulting finite CSG, we know that if player 1 plays e-optimally (which it can do 
with randomized S&M strategies), and player 2 plays arbitrarily, there is no bottom SCC 
in the resulting finite Markov chain other than the two designated terminating nodes "0" 
and "1". In other words, all the probability exits the system, as long as the maximizing 
player plays e-optimally. 

Now, take the remaining finite CSG, call it G' . Just put a copy of G' at the entry of 
the component A\ of the 1-RMC in Figured identifying the entry en with the initial node, 
u, of G'. Take every transition that is directed into the terminal node "1" of G, and instead 
direct it to the exit ex of the component A\. Next, take every edge that is directed into the 
terminal "0" node and direct it to the first call port, (b\, en) of the left box b\. Both boxes 
map to the unique component A\. Call this 1-RCSG A. 

We now claim that the value > 1/2 in the finite CSG G' for terminating at the 
terminal "1" iff the value = 1 for terminating in the resulting 1-RCSG, A. The reason 
is clear: after cleaning up the CSG, we know that under an e-optimal strategy for the 
maximizer for reaching "1", all the probability exits G' either at "1" or at "0". We also 
know that the supremum value that the maximizing player can attain will have value 1 iff 
the supremum probability it can attain for going directly to the exit of the component in 
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A is > 1/2, but this is precisely the supremum probability that maximizer can attain for 
going to "1" in G'. 

Lastly, note that the fact that the quantitative probability was taken to be 1/2 for the 
finite CSG is without loss of generality. Given a finite CSG G and a rational probability p, 
< p < 1, it is easy to efficiently construct another finite CSG G' such that the termination 
probability for G is > p iff the termination probability for G' is > 1/2. □ 

6. Conclusions 

We have studied Recursive Concurrent Stochastic Games (RCSGs), and we have shown 
that for 1-exit RCSGs with the termination objective we can decide both quantitative and 
qualitative problems associated with computing their values in P SPACE, using decision 
procedures for the existential theory of reals, whereas any substantial improvement (even 
to NP) of this complexity, even for their qualitative problem, would resolve a long standing 
open problem in exact numerical computation, namely the square-root sum problem. Fur- 
thermore, we have shown that the quantitative decision problem for finite-state concurrent 
stochastic games is also at least as hard as the square-root sum problem. 

An important open question is whether approximation of the game values, to within a 
desired additive error e > 0, for both finite-state concurrent games and for 1-RCSGs, can be 
done more efficiently. Our lower bounds (with respect to square-root sum) do not address 
the approximation question, and it still remains open whether (a suitably formulated gap 
decision problem associated with) approximating the value of even finite-state CSGs, to 
within a given additive error e > 0, is in NP. 

In [16], we showed that model checking linear-time (w-regular or LTL) properties for 
1-RMDPs (and thus also for 1-RSSGs) is undecidable, and that even the qualitative or 
approximate versions of such linear-time model checking questions remains undecidable. 
Specifically, for any e > 0, given as input a 1-RMDP and an LTL property, 92, it is unde- 
cidable to determine whether the optimal probability with which the controller can force 
(using its strategy) the executions of the 1-RMDP to satisfy (p, is probability 1, or is at 
most probability e, even when we are guaranteed that the input satisfies one of these two 
cases. Of course these undecidability results extend to the more general 1-RCSGs. 

On the other hand, building on our polynomial time algorithms for the qualitative 
termination problem for 1-RMDPs in |17j . Brazdil et. al. [I] showed decidability (in P- 
time) for the qualitative problem of deciding whether there exists a strategy under which 
a given target vertex (which may not be an exit) of a 1-RMDP is reached in any calling 
context (i.e., under any call stack) almost surely (i.e., with probability 1). They then used 
this decidability result to show that the qualititive model checking problem for 1-RMDPs 
against a qualitative fragment of the branching time probabilistic temporal logic PCTL is 
decidable. 

In the setting of 1-RCSGs (and even 1-RSSGs), it remains an open problem whether 
the qualitative problem of reachability of a vertex (in any calling context) is decidable. 
Moreover, it should be noted that even for 1-RMDPs, the problem of deciding whether the 
value of the reachability game is 1 is not known to be decidable. This is because although 
the result of [4] shows that it is decidable whether there exists a strategy that achieves 
probability 1 for reaching a desired vertex, there may not exist any optimal strategy for 
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this reachability problem, in other words the value may be 1 but it may only be attained 
as the supremum value achieved over all strategies. 
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